Modern supercomputers consist of a collection of more or less equal 'computing nodes', each node is equipped with one or more processors, memory, networking hardware and so on. In principle, such a node is not too different from a normal high-end PC.
In order to use the nodes efficiently, usage is made of a batch system. The concepts involved are not too difficult to understand, but there is a learning curve to get used to it. Moreover, there are some issues that are common to all batch systems, so that is why we wrote this Batch-HOWTO. We will explain what a batch system is, how to work with it and what the important details are. This document is by no means exhaustive and many important details and possibilities are not mentioned, but we hope that it suffices to give you a clear picture what a batch system is.
Practically everybody is used to interactive computer systems: word processing, using the Internet, compiling and running programs, etc. Here one types in commands or clicks on some buttons, and the computer system gives an immediate response.
In a batch system however, one has to prepare a file containing the commands ( a 'job' ) that will be executed later. These commands invoke computer programs, define the location of files and so on. In general, the commands will be executed on another system, possible even a system in another country. Here we will restrict ourself to the situation that the jobs will run on the same system, albeit on other nodes than the node the users login to. The advantages of a batch system are:
These terms can be loosely defined as:
A batch system is responsible for allocating cores, processors or nodes to a job. It depends on the system what kind of granularity is used. For example: in a system consisting of nodes, each with two single-core processors, it is customary to allocate a whole number of nodes to a job. On the other hand, if a node contains 64 or more processors, one can choose to allocate a number of processors to a job, allowing more than one job to run simultaneously on one node.
Various batch systems exist, but they all share the following parts:
A job can be classified using two parameters:
Next to these requirements, one can specify more details: the amount of memory that is needed; request special hardware (fast network adapters for example); specify the name of the standard output file; and so on. The user prepares a file wherein the job-specifications are made in a special language. This language differs from one job system to another. Here follow two examples for a job needing 2 nodes for 1 hour:
#PBS -lwalltime=1:00:00
#PBS -lnodes=2
# @ wall_clock_limit = 1:00:00
# @ node = 2
The things to be done are to be specified in a shell script. The shell can be bash, ksh, sh, csh and tcsh. We recommend to use in a job the same shell as the interactive login shell which normally will be bash. A typical shell script could be:
echo "Start of job at `date`"
cd $HOME/workdir
./myprog one two three
echo "End of job at `date`"
So, in order to use a batch system, one has to have a basic knowledge of shell programming. There are many bash tutorials on the web, here is an example.
To prepare and submit a job, one logs in to a login node, using some communication program (winscp, putty, ssh). Example for a UNIX workstation, connecting to the virgo system at HPCE:
ssh
and, using an editor (vi, emacs, joe) one puts the description of the requirements of a job and the shell script in one file. An example for the Torque batch system:
The contents of a file called 'myjob':
#PBS -lwalltime=3:00:00
#PBS -lnodes=2
echo "Start of job"
cd $HOME/workdir
./myprog phase1
echo "End of job"
To submit this job to the Torque batch system, issue the command:
qsub myjob
The system will take care that the job is run on two nodes and will abort the job after 3 hours since the start of the job, if it is still running. The job system assumes that the job is ended when the job script is ended. The accounted system usage is always the number of nodes or processors times the number of wall clock hours the job is running.
It is possible to write a computer program that uses more than one core at the same time in parallel. Such a program will need less time if there are more cores available. MPI and OpenMP are often used tools for creating parallel programs. Note that only parallelized programs can make use of more than one core. It is useless to ask for more than one core if the program is not parallelized.
Every job system offers some commands to monitor the status of a job. Have a look at the documentation for the system you are using.
After a job has been submitted there are various possibilities to manage it. The most important one is canceling a submitted job: removing it from the queue if it not running yet, or removing it when it is running. Sometimes there are possibilities to change the requirements of an already submitted job.
Most batch systems arrange that jobs are automatically resubmitted after a system failure. This is not always desirable, and there is always a way to prevent automatic rescheduling of your jobs, look at the appropriate documentation.
Depending on the system, it can be important to optimize a job with respect to the usage of the input, output or intermediate (scratch) files. One can imagine, that a load of hundreds of jobs, all frequently accessing a database on the home file system can be very bad for the performance of the I/O performance if this file system is accessed by NFS or a similar system. Therefore it is wise to put frequent accessed files onto a special scratch file system. On a cluster this could be the local disks available in a node. Detailed information is available in the documentation of the various systems.
If the job system allocates a whole number of nodes to a job, and one wants to run serial (not parallelized) jobs, the usage of the resources can be bad. For example, if a node contains 8 cores, only 12.5% of the available CPU power would be used. Given the fact that submitters of serial jobs always want to run many of those (otherwise they would not need a supercomputer), the solution to enhance the efficiency is not difficult, without parallelizing the serial program. Here follows an example how to run 8 processes simultaneously on a eight-core node:
for i in {1..8}
do
cd $HOME/workdir.$i
myprog input output 2>&1 &
done
wait
In this example, 8 instances of the program `myprog` are started in the background, each running in a different working directory. The wait command is essential: without it, the job script itself would end almost immediately, and the job system would decide that the job is ended and presumably kill all the myprog processes. In practice, one would write a script to generate this kind of jobs.
Sometimes it comes in handy if one can determine in a script if this is running as part of a batch job or interactively. A general way to set an environment variable is as follows:
unset INTERACTIVE
/usr/bin/tty > /dev/null 2>&1 && export INTERACTIVE=1
The environment variable INTERACTIVE will be set in an interactive environment, and unset in a batch environment.
Copyright © 2014 - All Rights Reserved - Domain Name
Template by OS Templates